NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A New Principle for Tuning-Free Huber Regression

https://doi.org/10.5705/ss.202019.0045

Wang, Lili; Zheng, Chao; Zhou, Wen; Zhou, Wen-Xin (January 2021, Statistica Sinica)
null (Ed.)
Full Text Available
Thouless conductances of a three-dimensional quantum Hall system

https://doi.org/10.1103/PhysRevB.102.064208

Zheng, Chao; Yang, Kun; Wan, Xin (August 2020, Physical Review B)

Full Text Available
Autoscaling High-Throughput Workloads on Container Orchestrators

https://doi.org/10.1109/CLUSTER49012.2020.00024

Zheng, Chao; Kremer-Herman, Nathaniel; Shaffer, Tim; Thain, Douglas (September 2020, IEEE Conference on Cluster Computing)
null (Ed.)
High-throughput computing (HTC) workloads seek to complete as many jobs as possible over a long period of time. Such workloads require efficient execution of many parallel jobs and can occupy a large number of resources for a longtime. As a result, full utilization is the normal state of an HTC facility. The widespread use of container orchestrators eases the deployment of HTC frameworks across different platforms,which also provides an opportunity to scale up HTC workloads with almost infinite resources on the public cloud. However, the autoscaling mechanisms of container orchestrators are primarily designed to support latency-sensitive microservices, and result in unexpected behavior when presented with HTC workloads. In this paper, we design a feedback autoscaler, High Throughput Autoscaler (HTA), that leverages the unique characteristics ofthe HTC workload to autoscales the resource pools used by HTC workloads on container orchestrators. HTA takes into account a reference input, the real-time status of the jobs’ queue, as well as two feedback inputs, resource consumption of jobs, and the resource initialization time of the container orchestrator. We implement HTA using the Makeflow workload manager, WorkQueue job scheduler, and the Kubernetes cluster manager. We evaluate its performance on both CPU-bound and IO-bound workloads. The evaluation results show that, by using HTA, we improve resource utilization by 5.6×with a slight increase in execution time (about 15%) for a CPU-bound workload, and shorten the workload execution time by up to 3.65×for an IO-bound workload.
more » « less
Full Text Available
Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos

https://doi.org/10.1109/CCGRID.2017.9

Zheng, Chao; Tovar, Ben; Thain, Douglas (May 2017, IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing)

Workflows are a widely used abstraction for describing large scientific applications and running them on distributed systems. However, most workflow systems have been silent on the question of what execution environment each task in the workflow is expected to run in. Consequently, a workflow may run successfully in the environment it was created, but fail on other platforms due to the differences in execution environment. Container-based schedulers have recently arisen as a potential solution to this problem, adopting containers to distribute computing resources and deliver well-defined execution environments to applications. In this paper, we con- sider how to connect workflow system to container schedulers with minimal performance loss and higher system efficiency. As an example of current technology, we use Makeflow and Mesos. We present five design challenges, and address them by using four configurations that connecting workflow system to container scheduler from different level of the infrastructure. In order to take full advantage of the resource sharing schema of Mesos, we enable the resource monitor of Makeflow to dynamically update the task resource requirement. We explore the performance of a large bioinformatics workflow, and observe that using Makeflow, Work Queue and the Resource monitor together not only increase the transfer throughput but also achieves highest resource usage rate.
more » « less
Full Text Available
Wharf: Sharing Docker Images in a Distributed File System

https://doi.org/10.1145/3267809.3267836

Zheng, Chao; Rupprecht, Lukas; Tarasov, Vasily; Thain, Douglas; Mohamed, Mohamed; Skourtis, Dimitrios; Warke, Amit S.; Hildebrand, Dean (January 2018, ACM Symposium on Cloud Computing)

Full Text Available
Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework

https://doi.org/10.1109/ICDCS.2017.196

Zhang, Daniel Yue; Zheng, Chao; Wang, Dong; Thain, Doug; Mu, Xin; Madey, Greg; Huang, Chao (June 2017, IEEE International Conference on Distributed Computing Systems)

With the rapid growth of online social media and ubiquitous Internet connectivity, social sensing has emerged as a new crowdsourcing application paradigm of collecting observations (often called claims) about the physical environment from humans or devices on their behalf. A fundamental problem in social sensing applications lies in effectively ascertaining the correctness of claims and the reliability of data sources without knowing either of them a priori, which is referred to as truth discovery. While significant progress has been made to solve the truth discovery problem, some important challenges have not been well addressed yet. First, existing truth discovery solutions did not fully solve the dynamic truth discovery problem where the ground truth of claims changes over time. Second, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. Third, the heterogeneity and unpredictability of the social sensing data traffic pose additional challenges to the resource allocation and system responsiveness. In this paper, we developed a Scalable Streaming Truth Discovery (SSTD) solution to address the above challenges. In particular, we first developed a dynamic truth discovery scheme based on Hidden Markov Models (HMM) to effectively infer the evolving truth of reported claims. We further developed a distributed framework to imple- ment the dynamic truth discovery scheme using Work Queue in HTCondor system. We also integrated the SSTD scheme with an optimal workload allocation mechanism to dynamically allocate the resources (e.g., cores, memories) to the truth discovery tasks based on their computation requirements. We evaluated SSTD through real world social sensing applications using Twitter data feeds. The evaluation results on three real-world data traces (i.e., Boston Bombing, Paris Shooting and College Football) show that the SSTD scheme is scalable and outperforms the state-of-the- art truth discovery methods in terms of both effectiveness and efficiency.
more » « less
Full Text Available

Search for: All records